Segmental Spatiotemporal CNNs for Fine-Grained Action Segmentation
نویسندگان
چکیده
Generalizes our temporal convolutions Unifies our unary and temporal models New results on GTEA, 50 Salads, & JIGSAWS 1) Improved Dense Trajectories (IDT) + Bag of Words do not work well on FGAR 2) Spatial & Spatiotemporal CNNs have been proposed, but IDT (or CNN+IDT) is typically superior [Sun ICCV15, Heilbron CVPR15, Simonyan ICLR15, Jain CVPR15, Tran ICCV15, Karpathy CVPR14, ...] 3) Current action localization results are poor, implying that these models capture scene information but not the essence of what defines an action
منابع مشابه
Segmental Spatio-Temporal CNNs for Fine-grained Action Segmentation and Classification
Joint segmentation and classification of fine-grained actions is important for applications in human-robot interaction, video surveillance, and human skill evaluation. However, despite substantial recent progress in large scale action classification, the performance of state-ofthe-art fine-grained action recognition approaches remains low. In this paper, we propose a new spatio-temporal CNN mod...
متن کاملFace Parsing via Recurrent Propagation
Face parsing is an important problem in computer vision that finds numerous applications including recognition and editing. Recently, deep convolutional neural networks (CNNs) have been applied to image parsing and segmentation with the state-of-the-art performance. In this paper, we propose a face parsing algorithm that combines hierarchical representations learned by a CNN, and accurate label...
متن کاملEfficient Hardware Realization of Convolutional Neural Networks using Intra-Kernel Regular Pruning
The recent trend toward increasingly deep convolutional neural networks (CNNs) leads to a higher demand of computational power and memory storage. Consequently, the deployment of CNNs in hardware has become more challenging. In this paper, we propose an Intra-Kernel Regular (IKR) pruning scheme to reduce the size and computational complexity of the CNNs by removing redundant weights at a fine-g...
متن کاملA Closer Look at Spatiotemporal Convolutions for Action Recognition
In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition. Our motivation stems from the observation that 2D CNNs applied to individual frames of the video have remained solid performers in action recognition. In this work we empirically demonstrate the accuracy advantages of 3D CNNs over 2D CNNs within the framework o...
متن کاملAn improved joint model: POS tagging and dependency parsing
Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...
متن کامل